TIPSTER Program Overview

نویسنده

  • Roberta H. Merchant
چکیده

The task of TIPSTER Phase I was to advance the state of the art in two language technologies, Document Detection and Information Extraction. Document Detection includes two subtasks, routing (running static queries against a stream of new data), and retrieval (running ad hoc queries against archival data). Information Extraction is a technology in which pre-specified types of information are located within free text, extracted, and placed within a database. Before TIPSTER users searching large volumes of data and using many queries had few information retrieval tools to use other than the boolean keyword search systems which had been developed more than a decade earlier. The characteristics of these boolean systems are: • low recall (the user loses an unknown quantity of useful information because the system is unable to retrieve many of the relevant documents) • low precision (the user has to read a very large number of irrelevant documents which the system has mistakenly retrieved) • no ranking or prioritization (the user must scan the entire list of retrieved documents because a good document is just as likely to be at the end of the list of retrieved documents as at the hesinning) • exact matches (the user must generate by hand variant spellings or alternate word choices because there are no built-in rules for adding variants) • hand built queries (the user has to understand how the system works and the syntax of queries in order to use the system) 3. DOCUMENT DETECTION DELIVERABLES IN PHASE H As a result of algorithm development in Phase I, during TIPSTER Phase lI. prototype systems will be built, giving the user Document Detection tools which feature the technology developed in Phase I: • improved recall (comparative evaluation of systems in TIPSTER and TREC [1] has demonstrated higher recall of relevant documents) • improved precisica (the user will read fewer useless documents in order to find the ones he wants) • ranked retrievals (the user reviews documents statistically ranked according to how well they match the query, thus improving the chances that the most useful documents will be near the top of the queue) • query expansion (the system, not the user, automatically expands queries to draw in more relevant documents by using concept based tools such as tbesauri) • automatic query generation (the system uses a natural language description of the subject supplied by the user to generate queries) 4. THE …

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

The TIPSTER Text Program Overview

These TIPSTER Phase III Proceedings bring to a close a program that had significant impact on information technology. Since 1991, the TIPSTER Text program has fostered the advancement of stateof-the-art technologies for text handling through the efforts of researchers and developers in the U.S. Government, industry and academia. The resulting capabilities are being deployed throughout the intel...

متن کامل

TIPSTER Phase I Final Report

Overview: During Phase I of the TIPSTER program, HNC developed a unique approach to machine learning of similarity of meaning. This approach, embodied in a system called "MatchPlus", exploits this learned similarity of meaning for concept-based text retrieval, routing and visualization of textual information. MatchPlus uses an information representation scheme called "context vectors" to encode...

متن کامل

Automatic Text Summarization in TIPSTER

Automatic Text Summarization was added as a major research thrust of the TIPSTER program during TIPSTER Phase III, 1996-1998. It is a natural extension of the previously supported research efforts in Information Extraction (IE) and Information Retrieval (IR). There is considerable interest in automatically producing summaries due, in large part, to the growth of the Internet and the World Wide ...

متن کامل

An Overview of the Prototype Information Dissemination System (PRIDES)

The Prototype Information Dissemination System (PRIDES) is a TIPSTER technology insertion project sponsored by the Office of Research and Development (ORD). PRIDES applies a portion of the TIPSTER detection architecture and several TIPSTER components to the problem of timely dissemination of Foreign Broadcast Information Service (FBIS) articles. When PRIDES begins operation in July 1996, it wil...

متن کامل

TIPSTER Program History

The third thread is the sponsorship of the international Message Understanding Conferences (MUC's) and Text Retrieval Conferences (TREC's). These conferences, which evaluated the state of the art and promoted text-processing R&D outside of the TIPSTER Text contracts, were organized by NRaD and NIST. MUC-1 and MUC-2 preceded and set the stage for TIPSTER, before the sponsorship of these conferen...

متن کامل

TIPSTER Phase III Goals

The primary goal of TIPSTER Phase III is to promote advancements in text processing technologies. To accomplish this goal, the TIPSTER Program will continue to encourage the cooperation of researchers and developers in government, industry and academia to achieve a balanced overall program. The Phase III framework is modeled on that of Phase II and will consist of four basic components: (1) Adv...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 1993